Reference Guide: Optimizing Backup Strategies for Red Hat OpenShift Virtualization

Protecting data stored in a Kubernetes cluster is a critical concern for administrators seeking to maximize security, meet compliance requirements, and enable effective disaster recovery. You must implement appropriate strategies and best practices to ensure your Kubernetes clusters can safely store sensitive data. 

This article explains how to implement data protection strategies within a Kubernetes cluster, highlighting the importance of enabling data encryption, developing a backup strategy, and leveraging native Kubernetes security controls.

Summary of key concepts related to Kubernetes data protection

Why data protection mattersData protection is an important aspect of any production Kubernetes cluster. Unauthorized access to sensitive information is a risk to any organization, so strong data protection best practices must be implemented to limit the risk of reputation damage and business loss.
Encrypting data for enhanced security at rest and in transitEncryption is a cornerstone of improving data security. You can encrypt storage media like persistent volumes, databases, and secrets to mitigate data leaks. Encryption in transit can be enabled with ingress TLS and service mesh mTLS features, ensuring all traffic is encrypted throughout the cluster.
Implementing native Kubernetes security controlsKubernetes provides many native security controls you should evaluate and enable to maximize cluster security. The controls include RBAC, network policies, Pod Security Standards (PSS), and Pod Security Admission (PSA). These controls will limit the “blast radius” of compromised pods and restrict unwanted access to data.
How backup strategies improve data securityImplementing a backup strategy is critical for data protection. Backups enable you to recover from outages and mitigate malware attacks by providing clean snapshots to restore from. Carefully evaluating your backup strategy is required for a secure cluster.

Why data protection matters

Data protection within a Kubernetes cluster is a key priority for designing a secure environment, especially for clusters hosting sensitive information such as customer data. Unauthorized access to data stored within a cluster can lead to data breaches, disrupting business operations and impacting the public’s trust of the affected organization.

Financial liability, loss of business, and reputation damage are all significant risks for Kubernetes environments that do not follow data protection best practices. You must implement best practices such as data encryption, backups, and access controls to ensure your Kubernetes clusters can prevent or quickly recover from potential breaches or data loss.

Data encryption at rest and in transit for enhanced security

Encryption plays a significant role in ensuring that data remains secure and inaccessible to unauthorized users. Encryption at rest and in transit are two fundamental aspects of data security that you need to address to safeguard your data effectively.

Encrypting data at rest

Data at rest refers to any data in storage, such as persistent volumes (PVs) and secrets in Kubernetes. Encrypting this data means that even if someone were to gain physical access to the storage, the data would be unreadable without the relevant encryption keys.

  • Encrypting persistent volumes: Data stored in persistent volumes can be encrypted by leveraging the Container Storage Interface (CSI) driver installed on the cluster. For example, AWS Elastic Kubernetes Service (EKS) users can implement the Elastic Block Store (EBS) CSI driver to manage EBS PVs for EKS clusters. The EBS CSI driver can create and delete the persistent volumes and encrypt them with your encryption keys. The encryption settings are defined within a storage class object, which is a template for EBS persistent volumes created using this storage class. The example below shows a storage class object that enables EBS persistent volume encryption and specifies an encryption key path. The result is that any persistent volumes launched with the storage class called “encrypted” will be automatically encrypted by the EBS CSI.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: encrypted
parameters:
  encrypted: "true"
  kmsKeyId: arn:aws:kms:<redacted:key/
  type: gp3
provisioner: ebs.csi.aws.com
  • Encrypting secrets in etcd: Secrets are the most sensitive type of Kubernetes object. Native controls can encrypt this object in the etcd database, which stores all Kubernetes objects in the cluster. Kubernetes natively supports automatically encrypting secrets to ensure they are not stored in plain text by etcd.

    The right approach to encrypting secrets depends on the type of Kubernetes cluster you’re operating: Self-managed clusters where you have access to the API Server require creating an encryption configuration object containing encryption keys. The API Server utilizes these keys to encrypt secrets before saving them to etcd. If you’re using a managed Kubernetes service like EKS, you can enable the envelope encryption feature to implement secrets encryption.
  • Encrypting databases: Database resources running within the cluster (such as via the MySQL operator) or outside the cluster (such as AWS RDS) should be encrypted to secure data at rest. Encrypting databases is a standard security best practice, mainly because databases are the most commonly used data storage location in most environments and often hold sensitive information. All modern database software supports encryption at rest, and you should leverage these features to ensure that a compromised database doesn’t result in leaked data. Other security features native to your database software, like user access control policies, should also be leveraged to mitigate unnecessary access to the database. 
  • Encrypting other storage: Consider all other storage infrastructures your clusters utilize and evaluate whether they offer encryption features. For example, S3 buckets, Redis caches, and network filesystems may also need to be encrypted if they are used by your cluster. 

It is also important to securely manage your encryption keys to maintain a strong security posture. To this end, follow best practices such as leveraging a key management system (KMS), regularly rotating encryption keys, restricting access to keys, and enabling access auditing.

Automated Kubernetes Data Protection & Intelligent Recovery

Perform secure application-centric backups of containers, VMs, helm & operators

Use pre-staged snapshots to instantly test, transform, and restore during recovery

Scale with fully automated policy-driven backup-and-restore workflows

Encrypting data in transit

Data in transit refers to data moving over a connection, such as between microservices, from applications to databases or clients to web servers. Encrypting this data ensures that it remains secure as it travels across the network:

  • Ingress TLS: Implement TLS for ingress traffic to encrypt traffic from external sources. Enabling TLS via your ingress objects ensures that all external connections made to your cluster’s workloads (such as web applications) are secured. The YAML snippet below shows an ingress object implementing TLS for the example.com hostname using a Kubernetes secret called “tls-secret”. Deploying these objects in a cluster with an ingress controller will result in incoming traffic for the example.com host being automatically encrypted.
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  tls:
  - hosts:
      - example.com
    secretName: tls-secret
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: webapplication
            port:
              number: 80
---
apiVersion: v1
kind: Secret
metadata:
  name: tls-secret
data:
  tls.crt: base64 encoded cert
  tls.key: base64 encoded key
type: kubernetes.io/tls

Microservice mTLS: Mutual TLS (mTLS) can provide two-way verification of communication among microservices within your cluster. In this setup, each microservice utilizes encryption keys for all connections, which improves cluster security because all communication across the cluster’s microservices is encrypted. Additionally, each microservice presents client and server certificates to validate each other’s identity for each connection. Service meshes like Istio can help you enable mTLS for all microservices in a cluster, making it easy to secure internal traffic without modifying your applications directly. The example below shows an Istio peer authentication object that enforces mTLS for all connections within the service mesh:

apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
spec:
  mtls:
    mode: STRICT

Watch this 1-min video to see how easily you can recover K8s, VMs, and containers

Backup encryption

Any Kubernetes environment where security and disaster recovery are essential must ensure that backups are also encrypted. Leveraging a solution that provides encryption for backups is valuable for maintaining a solid security posture while enabling disaster recovery. A solution like Trilio can encrypt your backups before they leave your cluster, providing an additional layer of security, so even if your backup data were intercepted, it would remain protected. The snippet below shows a Kubernetes backup plan object provided by Trilio to enable encrypted backups with custom encryption keys:

apiVersion: triliovault.trilio.io/v1
kind: BackupPlan
metadata:
  name: sample-application
spec:
  encryption:
    encryptionSecret:
      name: sample-secret
      namespace: BACKUPPLAN_NAMESPACE
  backupConfig:
    target:
      name: sample-target
    retentionPolicy:
      name: sample-retention-policy
  backupPlanComponents:
    helmReleases:
      - mysql
---
apiVersion: v1
kind: Secret
metadata:
  name: sample-secret
type: Opaque
data:
  encryptKey: bXllbmNyeXB0aW9ua2V5

Implementing Kubernetes application data security controls

You can leverage several native Kubernetes features to enhance the security posture of your applications and thus protect your data. Securing access to your Kubernetes objects, limiting pod privileges, and restricting network traffic improve data security by mitigating unwanted access to data and limiting the blast radius of compromised workloads.

Network policies

By default, all pods can communicate with each other. Kubernetes network policies allow you to control traffic among pods, effectively determining which workloads can communicate with each other. Implementing network policies lets you enforce a default deny network posture, explicitly allowing certain connections based on your application’s needs and significantly reducing the attack surface. A compromised pod will now have a limited blast radius due to limited network connectivity to other pods and their data.

Security contexts, Pod Security Standards (PSS), and Pod Security Admission (PSA)

Security contexts enable you to specify a pod’s privilege level. Pods can use security contexts to access privileges like root access and kernel capabilities. A default deny approach ensures that only allowed pods can access special privileges.

To enable policies that control privileged security contexts, Kubernetes offers PSS and PSA controls that enforce specific security profiles (baseline, restricted, and privileged) at the workload and namespace levels. You can implement PSS and PSA to control which workloads and namespaces can operate with special privileges, ensuring pods adhere to essential security requirements and significantly mitigate the risk of security breaches within the cluster. 

Alternative third-party projects like Gatekeeper and Kyverno offer a policy engine approach to enforce rules for any Kubernetes object fields, allowing you to implement more granular and secure rules. For example, you could restrict pods from only using container images from a trusted container registry, reducing the likelihood of a malicious image being deployed from a compromised registry.

Role-based access control (RBAC)

Kubernetes has built-in RBAC support, allowing you to control which users can perform actions like creating, deleting, and getting specific Kubernetes objects. For example, you might restrict access to secrets or config maps to only those users or applications that absolutely need it or limit developers on your team to particular namespaces. Leveraging Kubernetes RBAC is essential for ensuring that objects are only viewed or modified by allowed users, avoiding the possibility of either accidental or malicious changes to the cluster. Data protection best practices will require you to restrict access to secrets, config maps, and persistent volumes to authorized users only.

Learn about the features that power Trilio’s intelligent backup and restore

Protecting Kubernetes data with backups

Backups are a cornerstone of any robust data protection strategy, ensuring data availability and integrity. Backups also play a crucial role in recovering from security incidents. For example, in a malware attack that compromises your Kubernetes data, having clean, unaffected backups allows you to restore your data to a pre-attack state, minimizing downtime and the breach’s impact. This capability to revert your cluster data to a known good state is invaluable for maintaining business continuity and protecting against data loss.

A solid backup and disaster recovery plan is critical to your data protection strategy. Implementing proper backup protocols ensures that your data is safe and quickly recoverable in the event of a disaster (such as an attack or accidental data deletion). Here are some essential practices to follow:

  • Assess your data: Start by gathering details about what data is stored in your cluster, where it is located, its sensitivity, and any other information needed to comprehensively understand your data storage. This information will help inform decisions about selecting an appropriate tool and implementing proper backup procedures.
  • Analyze RPO and RTO requirements: The recovery point objective (RPO) and recovery time objective (RTO) are critical metrics in disaster recovery planning. The RPO determines the maximum age of files that must be recovered from backup storage for normal operations to resume; the RTO defines the maximum amount of time allowed to recover the files. These targets must be determined to effectively plan a disaster recovery strategy, such as how often backups must be taken and how quickly they must be restorable. Implementing automation can significantly improve both RTO and RPO by reducing restoration time.

Following a disaster event, impact is visible as both data loss and downtime. RPO and RTO objectives define a target for how well impact can be limited following a disaster.

Following a disaster event, impact is visible as both data loss and downtime. RPO and RTO objectives define a target for how well impact can be limited following a disaster.

  • Plan a backup protocol: Based on the requirements above, plan your backup strategy by evaluating what data needs to be backed up, how frequently backups should be done, where the backups will be stored, how long backups should be retained, and the restoration steps. The plan must be tailored to your requirements, particularly regarding the backup frequency and restoration process.
  • Choose the right tool: Select a tool that meets your requirements, including supporting your desired data storage sources, integrating with backup targets like S3, and allowing you to tailor an automated backup schedule.
  • Document your plan: Documenting your backup strategy and recovery process is important to ensure your team is well-informed about your backup strategy. They should routinely evaluate whether backups are working as expected, whether any improvements are necessary, and whether the documented recovery steps are clear enough to follow during a disaster recovery event quickly.
  • Test your recovery process: Routinely test your backups by restoring a snapshot and verifying that the data is restored correctly. This will help verify that your recovery steps are accurate, the backup process is still working, and the backup tool is operating correctly.

A tool that can help you to implement a comprehensive backup solution is Trilio, which offers an advanced holistic approach to backups in Kubernetes with many valuable features for users focusing on data protection and disaster recovery:

  • Backup and recovery: Trilio allows you to create and manage backups of Kubernetes objects (such as pods, Helm charts, and operators) and the data stored on persistent volumes. In case of data loss or corruption, backups can be restored partially for selected objects or the whole cluster.
  • Consistency: Data that is being actively modified by a live application can be challenging to back up—data may be modified while a backup is in progress, causing the backup to have an inconsistent state with malformed data. Trilio ensures data consistency when creating backups of live clusters by allowing application-specific hooks to flush in-memory data to disk before snapshot creation. This feature ensures that backups are application-consistent and can be restored reliably.
  • Incremental backups: You can save space and network bandwidth by only backing up the data changed since the last backup, which is called creating incremental backups. Users can leverage Trilio to generate incremental backups indefinitely and rely on the tool to merge each incremental backup intelligently during a restore. Backups are generated using the QCOW2 open format, enabling you to leverage community tools to access Trilio’s snapshot data. The snapshots are also restorable to any standard Kubernetes cluster.
  • Multi-tenant, multi-cluster, and multi-cloud capabilities: By leveraging Kubernetes RBAC, Trilio allows you to define precise access controls for multiple tenants on the same cluster. This lets multiple tenants manage their own backups without accessing each other’s data. Trilio’s architecture also enables multi-cluster support, allowing you to manage backups for multiple clusters from a single pane of glass. Trilio supports integration with various platforms (like AWS, Google, and OpenShift), allowing you to quickly implement a multi-cloud backup solution with a single tool.
  • S3-compatible backup targets and NFS support: Trilio supports storing snapshots on NFS and S3-compatible storage, like AWS S3. A typical pattern is to store backups on AWS S3 and then leverage S3’s Object Lock feature to ensure that backups are immutable and cannot be modified either maliciously or accidentally. Additionally, enabling lifecycle policies for Glacier will allow your backups to migrate to longer-term storage for cost efficiency. Trilio native integration with AWS S3 can be easily set up with a “target” Kubernetes object, as seen below:
apiVersion: triliovault.trilio.io/v1
kind: Target
metadata:
  name: demo-s3-target
spec:
  type: ObjectStore
  vendor: AWS
  objectStoreCredentials:
    region: us-east-1
    bucketName: trilio-backup-test
    credentialSecret:
      name: sample-secret
      namespace: TARGET_NAMESPACE
  thresholdCapacity: 5Gi

Learn about a lead telecom firm solved K8s backup and recovery with Trilio

Conclusion

Throughout this article, we’ve explored the critical aspects of data protection within Kubernetes environments, emphasizing the necessity of strict access controls, encryption, and the strategic use of backups. Implementing access controls like RBAC and network policies and restricting pod security contexts improve data security by mitigating unwanted and unnecessary access to Kubernetes objects.

Adopting a default deny approach ensures data protection by reducing an attacker’s ability to access data and the blast radius of compromised pods. 

Enabling encryption at rest and in transit ensures that data is protected even when obtained by an attacker. Enabling encryption for persistent volumes, ingress traffic, and service meshes helps reduce the likelihood of data leaks and the resulting loss of public trust.

Adopting a backup protocol to regularly backup cluster data allows you to protect against malware infections, improve disaster recovery capabilities, and reduce the impact of data loss events.

Trilio is an advanced backup solution that provides a wide range of functionality to enable backup strategies in any Kubernetes cluster. It allows you to better secure your data, improve disaster recovery times, and mitigate malware by restoring clean snapshots. We recommend that users evaluate their clusters’ data protection strategies and implement best practices like access controls, encryption, and backups to improve their Kubernetes environment security posture.